Overview

Dataset statistics

Number of variables18
Number of observations1017209
Missing cells2173431
Missing cells (%)11.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory147.5 MiB
Average record size in memory152.0 B

Variable types

Numeric9
Categorical9

Warnings

Date has a high cardinality: 942 distinct values High cardinality
DayOfWeek is highly correlated with OpenHigh correlation
Sales is highly correlated with Customers and 1 other fieldsHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Open is highly correlated with DayOfWeek and 2 other fieldsHigh correlation
DayOfWeek is highly correlated with OpenHigh correlation
Sales is highly correlated with Customers and 1 other fieldsHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Open is highly correlated with DayOfWeek and 2 other fieldsHigh correlation
Sales is highly correlated with Customers and 1 other fieldsHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Open is highly correlated with Sales and 1 other fieldsHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
Sales is highly correlated with Customers and 2 other fieldsHigh correlation
Promo2SinceWeek is highly correlated with CompetitionOpenSinceYear and 2 other fieldsHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Assortment is highly correlated with StoreType and 1 other fieldsHigh correlation
Open is highly correlated with Sales and 2 other fieldsHigh correlation
Promo is highly correlated with SalesHigh correlation
CompetitionOpenSinceYear is highly correlated with Promo2SinceWeekHigh correlation
DayOfWeek is highly correlated with OpenHigh correlation
Promo2SinceYear is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
StateHoliday is highly correlated with OpenHigh correlation
PromoInterval is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
Promo2 is highly correlated with PromoIntervalHigh correlation
Assortment is highly correlated with StoreTypeHigh correlation
PromoInterval is highly correlated with Promo2High correlation
CompetitionOpenSinceMonth has 323348 (31.8%) missing values Missing
CompetitionOpenSinceYear has 323348 (31.8%) missing values Missing
Promo2SinceWeek has 508031 (49.9%) missing values Missing
Promo2SinceYear has 508031 (49.9%) missing values Missing
PromoInterval has 508031 (49.9%) missing values Missing
Sales has 172871 (17.0%) zeros Zeros
Customers has 172869 (17.0%) zeros Zeros

Reproduction

Analysis started2021-07-18 23:46:02.318603
Analysis finished2021-07-18 23:48:51.114423
Duration2 minutes and 48.8 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

Store
Real number (ℝ≥0)

Distinct1115
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean558.4297268
Minimum1
Maximum1115
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 MiB
2021-07-19T00:48:51.276007image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile56
Q1280
median558
Q3838
95-th percentile1060
Maximum1115
Range1114
Interquartile range (IQR)558

Descriptive statistics

Standard deviation321.9086511
Coefficient of variation (CV)0.5764532862
Kurtosis-1.200523741
Mean558.4297268
Median Absolute Deviation (MAD)279
Skewness-0.000954879981
Sum568039744
Variance103625.1797
MonotonicityNot monotonic
2021-07-19T00:48:51.509188image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1023942
 
0.1%
666942
 
0.1%
675942
 
0.1%
163942
 
0.1%
674942
 
0.1%
162942
 
0.1%
673942
 
0.1%
161942
 
0.1%
672942
 
0.1%
160942
 
0.1%
Other values (1105)1007789
99.1%
ValueCountFrequency (%)
1942
0.1%
2942
0.1%
3942
0.1%
4942
0.1%
5942
0.1%
6942
0.1%
7942
0.1%
8942
0.1%
9942
0.1%
10942
0.1%
ValueCountFrequency (%)
1115942
0.1%
1114942
0.1%
1113942
0.1%
1112942
0.1%
1111942
0.1%
1110942
0.1%
1109758
0.1%
1108942
0.1%
1107758
0.1%
1106942
0.1%

DayOfWeek
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.998340557
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 MiB
2021-07-19T00:48:51.709689image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.997390965
Coefficient of variation (CV)0.4995549869
Kurtosis-1.246873339
Mean3.998340557
Median Absolute Deviation (MAD)2
Skewness0.001592822804
Sum4067148
Variance3.989570667
MonotonicityNot monotonic
2021-07-19T00:48:51.885359image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
5145845
14.3%
4145845
14.3%
3145665
14.3%
2145664
14.3%
7144730
14.2%
6144730
14.2%
1144730
14.2%
ValueCountFrequency (%)
1144730
14.2%
2145664
14.3%
3145665
14.3%
4145845
14.3%
5145845
14.3%
6144730
14.2%
7144730
14.2%
ValueCountFrequency (%)
7144730
14.2%
6144730
14.2%
5145845
14.3%
4145845
14.3%
3145665
14.3%
2145664
14.3%
1144730
14.2%

Date
Categorical

HIGH CARDINALITY

Distinct942
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
2013-04-07
 
1115
2013-02-21
 
1115
2014-02-02
 
1115
2013-11-12
 
1115
2015-05-31
 
1115
Other values (937)
1011634 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters10172090
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2015-07-31
2nd row2015-07-31
3rd row2015-07-31
4th row2015-07-31
5th row2015-07-31

Common Values

ValueCountFrequency (%)
2013-04-071115
 
0.1%
2013-02-211115
 
0.1%
2014-02-021115
 
0.1%
2013-11-121115
 
0.1%
2015-05-311115
 
0.1%
2013-07-201115
 
0.1%
2014-03-081115
 
0.1%
2015-07-071115
 
0.1%
2014-03-131115
 
0.1%
2015-04-111115
 
0.1%
Other values (932)1006059
98.9%

Length

2021-07-19T00:48:52.339355image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2013-04-071115
 
0.1%
2013-02-211115
 
0.1%
2014-02-021115
 
0.1%
2013-11-121115
 
0.1%
2015-05-311115
 
0.1%
2013-07-201115
 
0.1%
2014-03-081115
 
0.1%
2015-07-071115
 
0.1%
2014-03-131115
 
0.1%
2015-04-111115
 
0.1%
Other values (932)1006059
98.9%

Most occurring characters

ValueCountFrequency (%)
02307842
22.7%
-2034418
20.0%
11825657
17.9%
21606379
15.8%
3660614
 
6.5%
4574660
 
5.6%
5440530
 
4.3%
6200805
 
2.0%
7198570
 
2.0%
8164005
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8137672
80.0%
Dash Punctuation2034418
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02307842
28.4%
11825657
22.4%
21606379
19.7%
3660614
 
8.1%
4574660
 
7.1%
5440530
 
5.4%
6200805
 
2.5%
7198570
 
2.4%
8164005
 
2.0%
9158610
 
1.9%
Dash Punctuation
ValueCountFrequency (%)
-2034418
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common10172090
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02307842
22.7%
-2034418
20.0%
11825657
17.9%
21606379
15.8%
3660614
 
6.5%
4574660
 
5.6%
5440530
 
4.3%
6200805
 
2.0%
7198570
 
2.0%
8164005
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII10172090
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02307842
22.7%
-2034418
20.0%
11825657
17.9%
21606379
15.8%
3660614
 
6.5%
4574660
 
5.6%
5440530
 
4.3%
6200805
 
2.0%
7198570
 
2.0%
8164005
 
1.6%

Sales
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct21734
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5773.818972
Minimum0
Maximum41551
Zeros172871
Zeros (%)17.0%
Negative0
Negative (%)0.0%
Memory size15.5 MiB
2021-07-19T00:48:52.555762image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13727
median5744
Q37856
95-th percentile12137
Maximum41551
Range41551
Interquartile range (IQR)4129

Descriptive statistics

Standard deviation3849.926175
Coefficient of variation (CV)0.6667902464
Kurtosis1.778374747
Mean5773.818972
Median Absolute Deviation (MAD)2067
Skewness0.6414596158
Sum5873180623
Variance14821931.55
MonotonicityNot monotonic
2021-07-19T00:48:52.791281image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0172871
 
17.0%
5674215
 
< 0.1%
5558197
 
< 0.1%
5483196
 
< 0.1%
6049195
 
< 0.1%
6214195
 
< 0.1%
5723194
 
< 0.1%
5449192
 
< 0.1%
5140191
 
< 0.1%
5489191
 
< 0.1%
Other values (21724)842572
82.8%
ValueCountFrequency (%)
0172871
17.0%
461
 
< 0.1%
1241
 
< 0.1%
1331
 
< 0.1%
2861
 
< 0.1%
2971
 
< 0.1%
3161
 
< 0.1%
4161
 
< 0.1%
5061
 
< 0.1%
5201
 
< 0.1%
ValueCountFrequency (%)
415511
< 0.1%
387221
< 0.1%
384841
< 0.1%
383671
< 0.1%
380371
< 0.1%
380251
< 0.1%
376461
< 0.1%
374031
< 0.1%
373761
< 0.1%
371221
< 0.1%

Customers
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct4086
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean633.1459464
Minimum0
Maximum7388
Zeros172869
Zeros (%)17.0%
Negative0
Negative (%)0.0%
Memory size15.5 MiB
2021-07-19T00:48:53.030789image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1405
median609
Q3837
95-th percentile1362
Maximum7388
Range7388
Interquartile range (IQR)432

Descriptive statistics

Standard deviation464.4117339
Coefficient of variation (CV)0.7334987083
Kurtosis7.091772718
Mean633.1459464
Median Absolute Deviation (MAD)216
Skewness1.59865029
Sum644041755
Variance215678.2586
MonotonicityNot monotonic
2021-07-19T00:48:53.249344image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0172869
 
17.0%
5602414
 
0.2%
5762363
 
0.2%
6032337
 
0.2%
5712330
 
0.2%
5552328
 
0.2%
5662327
 
0.2%
5172326
 
0.2%
5392309
 
0.2%
6512299
 
0.2%
Other values (4076)823307
80.9%
ValueCountFrequency (%)
0172869
17.0%
31
 
< 0.1%
51
 
< 0.1%
81
 
< 0.1%
131
 
< 0.1%
181
 
< 0.1%
361
 
< 0.1%
401
 
< 0.1%
441
 
< 0.1%
501
 
< 0.1%
ValueCountFrequency (%)
73881
< 0.1%
54941
< 0.1%
54581
< 0.1%
53871
< 0.1%
52971
< 0.1%
51921
< 0.1%
51521
< 0.1%
51451
< 0.1%
51321
< 0.1%
51121
< 0.1%

Open
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
1
844392 
0
172817 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Length

2021-07-19T00:48:53.573797image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T00:48:53.666556image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Most occurring characters

ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Most occurring scripts

ValueCountFrequency (%)
Common1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Promo
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
0
629129 
1
388080 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

Length

2021-07-19T00:48:53.891314image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T00:48:53.982836image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

Most occurring characters

ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

Most occurring scripts

ValueCountFrequency (%)
Common1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

StateHoliday
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
0
986159 
a
 
20260
b
 
6690
c
 
4100

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters4
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0986159
96.9%
a20260
 
2.0%
b6690
 
0.7%
c4100
 
0.4%

Length

2021-07-19T00:48:54.370912image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T00:48:54.467179image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0986159
96.9%
a20260
 
2.0%
b6690
 
0.7%
c4100
 
0.4%

Most occurring characters

ValueCountFrequency (%)
0986159
96.9%
a20260
 
2.0%
b6690
 
0.7%
c4100
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number986159
96.9%
Lowercase Letter31050
 
3.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a20260
65.2%
b6690
 
21.5%
c4100
 
13.2%
Decimal Number
ValueCountFrequency (%)
0986159
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common986159
96.9%
Latin31050
 
3.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a20260
65.2%
b6690
 
21.5%
c4100
 
13.2%
Common
ValueCountFrequency (%)
0986159
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0986159
96.9%
a20260
 
2.0%
b6690
 
0.7%
c4100
 
0.4%

SchoolHoliday
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
0
835488 
1
181721 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

Length

2021-07-19T00:48:54.719523image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T00:48:54.811484image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

Most occurring characters

ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

Most occurring scripts

ValueCountFrequency (%)
Common1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

StoreType
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
a
551627 
d
312912 
c
136840 
b
 
15830

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowc
2nd rowa
3rd rowa
4th rowc
5th rowa

Common Values

ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Length

2021-07-19T00:48:55.053445image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T00:48:55.148831image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Most occurring characters

ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1017209
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Latin1017209
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Assortment
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
a
537445 
c
471470 
b
 
8294

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa
2nd rowa
3rd rowa
4th rowc
5th rowa

Common Values

ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

Length

2021-07-19T00:48:55.408074image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T00:48:55.501903image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

Most occurring characters

ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1017209
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Latin1017209
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

CompetitionDistance
Real number (ℝ≥0)

Distinct654
Distinct (%)0.1%
Missing2642
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean5430.085652
Minimum20
Maximum75860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 MiB
2021-07-19T00:48:55.622853image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile130
Q1710
median2330
Q36890
95-th percentile20390
Maximum75860
Range75840
Interquartile range (IQR)6180

Descriptive statistics

Standard deviation7715.3237
Coefficient of variation (CV)1.420847514
Kurtosis13.00002236
Mean5430.085652
Median Absolute Deviation (MAD)1980
Skewness2.928534017
Sum5509185710
Variance59526219.8
MonotonicityNot monotonic
2021-07-19T00:48:55.810350image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25011120
 
1.1%
507536
 
0.7%
3507536
 
0.7%
12007374
 
0.7%
1907352
 
0.7%
906594
 
0.6%
1806594
 
0.6%
3306410
 
0.6%
1506226
 
0.6%
26405652
 
0.6%
Other values (644)942173
92.6%
ValueCountFrequency (%)
20942
 
0.1%
303767
0.4%
404710
0.5%
507536
0.7%
602826
 
0.3%
704526
0.4%
802826
 
0.3%
906594
0.6%
1004710
0.5%
1105468
0.5%
ValueCountFrequency (%)
75860942
0.1%
58260942
0.1%
48330942
0.1%
46590942
0.1%
45740942
0.1%
44320942
0.1%
40860942
0.1%
40540942
0.1%
38710942
0.1%
38630942
0.1%

CompetitionOpenSinceMonth
Real number (ℝ≥0)

MISSING

Distinct12
Distinct (%)< 0.1%
Missing323348
Missing (%)31.8%
Infinite0
Infinite (%)0.0%
Mean7.222865963
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 MiB
2021-07-19T00:48:55.983079image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median8
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.211832113
Coefficient of variation (CV)0.4446755803
Kurtosis-1.248357036
Mean7.222865963
Median Absolute Deviation (MAD)3
Skewness-0.1698616346
Sum5011665
Variance10.31586553
MonotonicityNot monotonic
2021-07-19T00:48:56.111871image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
9114254
 
11.2%
487076
 
8.6%
1184455
 
8.3%
363548
 
6.2%
759434
 
5.8%
1257896
 
5.7%
1055622
 
5.5%
645444
 
4.5%
539608
 
3.9%
237886
 
3.7%
Other values (2)48638
 
4.8%
(Missing)323348
31.8%
ValueCountFrequency (%)
112452
 
1.2%
237886
 
3.7%
363548
6.2%
487076
8.6%
539608
 
3.9%
645444
 
4.5%
759434
5.8%
836186
 
3.6%
9114254
11.2%
1055622
5.5%
ValueCountFrequency (%)
1257896
5.7%
1184455
8.3%
1055622
5.5%
9114254
11.2%
836186
 
3.6%
759434
5.8%
645444
 
4.5%
539608
 
3.9%
487076
8.6%
363548
6.2%

CompetitionOpenSinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct23
Distinct (%)< 0.1%
Missing323348
Missing (%)31.8%
Infinite0
Infinite (%)0.0%
Mean2008.690228
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 MiB
2021-07-19T00:48:56.251388image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile2001
Q12006
median2010
Q32013
95-th percentile2015
Maximum2015
Range115
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.992644444
Coefficient of variation (CV)0.002983359187
Kurtosis121.934675
Mean2008.690228
Median Absolute Deviation (MAD)3
Skewness-7.539514879
Sum1393751810
Variance35.91178743
MonotonicityNot monotonic
2021-07-19T00:48:56.406924image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
201375426
 
7.4%
201274299
 
7.3%
201463732
 
6.3%
200556564
 
5.6%
201051258
 
5.0%
201149396
 
4.9%
200949396
 
4.9%
200848476
 
4.8%
200743744
 
4.3%
200642802
 
4.2%
Other values (13)138768
13.6%
(Missing)323348
31.8%
ValueCountFrequency (%)
1900758
 
0.1%
1961942
 
0.1%
19904710
 
0.5%
19941884
 
0.2%
19951700
 
0.2%
1998942
 
0.1%
19997352
 
0.7%
20009236
 
0.9%
200114704
1.4%
200224882
2.4%
ValueCountFrequency (%)
201535060
3.4%
201463732
6.3%
201375426
7.4%
201274299
7.3%
201149396
4.9%
201051258
5.0%
200949396
4.9%
200848476
4.8%
200743744
4.3%
200642802
4.2%

Promo2
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.5 MiB
1
509178 
0
508031 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Length

2021-07-19T00:48:56.710991image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T00:48:56.803193image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Most occurring characters

ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Most occurring scripts

ValueCountFrequency (%)
Common1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Promo2SinceWeek
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct24
Distinct (%)< 0.1%
Missing508031
Missing (%)49.9%
Infinite0
Infinite (%)0.0%
Mean23.26909254
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 MiB
2021-07-19T00:48:56.894334image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q113
median22
Q337
95-th percentile45
Maximum50
Range49
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.09597253
Coefficient of variation (CV)0.6057809305
Kurtosis-1.369928605
Mean23.26909254
Median Absolute Deviation (MAD)13
Skewness0.1045275226
Sum11848110
Variance198.6964415
MonotonicityNot monotonic
2021-07-19T00:48:57.051078image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1472990
 
7.2%
4062598
 
6.2%
3139976
 
3.9%
1038828
 
3.8%
535818
 
3.5%
3732786
 
3.2%
132418
 
3.2%
1329820
 
2.9%
4529268
 
2.9%
2228694
 
2.8%
Other values (14)105982
 
10.4%
(Missing)508031
49.9%
ValueCountFrequency (%)
132418
3.2%
535818
3.5%
6942
 
0.1%
912452
 
1.2%
1038828
3.8%
1329820
2.9%
1472990
7.2%
1827318
 
2.7%
2228694
 
2.8%
234342
 
0.4%
ValueCountFrequency (%)
50942
 
0.1%
49758
 
0.1%
488294
 
0.8%
4529268
2.9%
442642
 
0.3%
4062598
6.2%
394732
 
0.5%
3732786
3.2%
369236
 
0.9%
3522814
 
2.2%

Promo2SinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing508031
Missing (%)49.9%
Infinite0
Infinite (%)0.0%
Mean2011.752774
Minimum2009
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.5 MiB
2021-07-19T00:48:57.188444image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12011
median2012
Q32013
95-th percentile2014
Maximum2015
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.662870431
Coefficient of variation (CV)0.0008265779235
Kurtosis-1.04066228
Mean2011.752774
Median Absolute Deviation (MAD)1
Skewness-0.1200599167
Sum1024340254
Variance2.765138069
MonotonicityNot monotonic
2021-07-19T00:48:57.310333image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2011115056
 
11.3%
2013110464
 
10.9%
201479922
 
7.9%
201273174
 
7.2%
200965270
 
6.4%
201056240
 
5.5%
20159052
 
0.9%
(Missing)508031
49.9%
ValueCountFrequency (%)
200965270
6.4%
201056240
5.5%
2011115056
11.3%
201273174
7.2%
2013110464
10.9%
201479922
7.9%
20159052
 
0.9%
ValueCountFrequency (%)
20159052
 
0.9%
201479922
7.9%
2013110464
10.9%
201273174
7.2%
2011115056
11.3%
201056240
5.5%
200965270
6.4%

PromoInterval
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing508031
Missing (%)49.9%
Memory size15.5 MiB
Jan,Apr,Jul,Oct
293122 
Feb,May,Aug,Nov
118596 
Mar,Jun,Sept,Dec
97460 

Length

Max length16
Median length15
Mean length15.19140654
Min length15

Characters and Unicode

Total characters7735130
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJan,Apr,Jul,Oct
2nd rowJan,Apr,Jul,Oct
3rd rowJan,Apr,Jul,Oct
4th rowJan,Apr,Jul,Oct
5th rowFeb,May,Aug,Nov

Common Values

ValueCountFrequency (%)
Jan,Apr,Jul,Oct293122
28.8%
Feb,May,Aug,Nov118596
 
11.7%
Mar,Jun,Sept,Dec97460
 
9.6%
(Missing)508031
49.9%

Length

2021-07-19T00:48:57.636119image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T00:48:57.742625image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
jan,apr,jul,oct293122
57.6%
feb,may,aug,nov118596
23.3%
mar,jun,sept,dec97460
 
19.1%

Most occurring characters

ValueCountFrequency (%)
,1527534
19.7%
J683704
 
8.8%
a509178
 
6.6%
u509178
 
6.6%
A411718
 
5.3%
n390582
 
5.0%
p390582
 
5.0%
r390582
 
5.0%
c390582
 
5.0%
t390582
 
5.0%
Other values (13)2140908
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4170884
53.9%
Uppercase Letter2036712
26.3%
Other Punctuation1527534
 
19.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a509178
12.2%
u509178
12.2%
n390582
9.4%
p390582
9.4%
r390582
9.4%
c390582
9.4%
t390582
9.4%
e313516
7.5%
l293122
7.0%
b118596
 
2.8%
Other values (4)474384
11.4%
Uppercase Letter
ValueCountFrequency (%)
J683704
33.6%
A411718
20.2%
O293122
14.4%
M216056
 
10.6%
F118596
 
5.8%
N118596
 
5.8%
S97460
 
4.8%
D97460
 
4.8%
Other Punctuation
ValueCountFrequency (%)
,1527534
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6207596
80.3%
Common1527534
 
19.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
J683704
 
11.0%
a509178
 
8.2%
u509178
 
8.2%
A411718
 
6.6%
n390582
 
6.3%
p390582
 
6.3%
r390582
 
6.3%
c390582
 
6.3%
t390582
 
6.3%
e313516
 
5.1%
Other values (12)1827392
29.4%
Common
ValueCountFrequency (%)
,1527534
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII7735130
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
,1527534
19.7%
J683704
 
8.8%
a509178
 
6.6%
u509178
 
6.6%
A411718
 
5.3%
n390582
 
5.0%
p390582
 
5.0%
r390582
 
5.0%
c390582
 
5.0%
t390582
 
5.0%
Other values (13)2140908
27.7%

Interactions

2021-07-19T00:48:17.229297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:17.589426image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:17.963653image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:18.329895image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:18.701844image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:19.009760image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:19.311210image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:19.580758image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:19.847399image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:20.214601image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:20.569073image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:20.945563image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:21.315642image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:21.692847image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:22.013427image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:22.319769image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:22.592996image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:22.861197image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:23.229490image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:23.577644image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:23.941139image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:24.302854image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:24.681723image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:24.992389image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:25.315996image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:25.602908image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:25.858150image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:26.215522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:26.582149image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:26.980555image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:27.352261image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:27.741381image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:28.068731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:28.387725image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:28.669628image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:28.923771image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:29.283169image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:29.653251image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:30.041812image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:30.424586image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:30.796867image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:31.115853image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:31.421932image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:31.696306image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:31.955775image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:32.260185image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:32.556628image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:32.868176image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:33.187607image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:33.493124image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:33.800802image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:34.099451image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:34.331698image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:34.555687image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:34.859664image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:35.259299image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:35.578937image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:35.912954image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:36.233392image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:36.554540image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:36.854882image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:37.092191image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:37.322794image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:37.596257image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:37.854887image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:38.126011image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:38.391370image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:38.651384image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:38.878654image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:39.102194image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:39.363569image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:39.760747image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:40.026792image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:40.295843image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:40.576047image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:40.849256image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:41.115720image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:41.350394image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:41.583316image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:41.859158image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-07-19T00:48:42.120438image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-07-19T00:48:57.856599image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-07-19T00:48:58.125442image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-07-19T00:48:58.394891image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-07-19T00:48:58.675786image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-07-19T00:48:58.949461image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-07-19T00:48:42.691096image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-07-19T00:48:44.744546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-07-19T00:48:49.241198image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-07-19T00:48:50.043374image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

StoreDayOfWeekDateSalesCustomersOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
0152015-07-3152635551101ca1270.09.02008.00NaNNaNNaN
1252015-07-3160646251101aa570.011.02007.0113.02010.0Jan,Apr,Jul,Oct
2352015-07-3183148211101aa14130.012.02006.0114.02011.0Jan,Apr,Jul,Oct
3452015-07-311399514981101cc620.09.02009.00NaNNaNNaN
4552015-07-3148225591101aa29910.04.02015.00NaNNaNNaN
5652015-07-3156515891101aa310.012.02013.00NaNNaNNaN
6752015-07-311534414141101ac24000.04.02013.00NaNNaNNaN
7852015-07-3184928331101aa7520.010.02014.00NaNNaNNaN
8952015-07-3185656871101ac2030.08.02000.00NaNNaNNaN
91052015-07-3171856811101aa3160.09.02009.00NaNNaNNaN

Last rows

StoreDayOfWeekDateSalesCustomersOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
1017199110622013-01-010000a1ac5330.09.02011.0131.02013.0Jan,Apr,Jul,Oct
1017200110722013-01-010000a1aa1400.06.02012.0113.02010.0Jan,Apr,Jul,Oct
1017201110822013-01-010000a1aa540.04.02004.00NaNNaNNaN
1017202110922013-01-010000a1ca3490.04.02011.0122.02012.0Jan,Apr,Jul,Oct
1017203111022013-01-010000a1cc900.09.02010.00NaNNaNNaN
1017204111122013-01-010000a1aa1900.06.02014.0131.02013.0Jan,Apr,Jul,Oct
1017205111222013-01-010000a1cc1880.04.02006.00NaNNaNNaN
1017206111322013-01-010000a1ac9260.0NaNNaN0NaNNaNNaN
1017207111422013-01-010000a1ac870.0NaNNaN0NaNNaNNaN
1017208111522013-01-010000a1dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec